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Abstract. This article presents an application of the inferior Smarandache f-part 
function to a particular parallel loop-scheduling problem. The product between 
an upper diagonal matrix and a vector is analysed from parallel computation 
point of view. An efficient solution for this problem is given by using the 
inferior Smarandache f-part function. Finally, the efficiency of our solution is 


proved experimentally by presenting some computational results. 


Parallel programming has been intensely developed in order to solve difficult 
problems that contain either a big number of computation or a large volume of data. 
These often occur both in real word applications (e.g. Weather Prediction) or 
theoretical problems (e.g. Differential Equations). Unfortunately, there is not a 
standard for writing parallel programs; this depends on the parallel language used or 
the parallel platform on which the computation is performed. A common fact of this 
diversity is represented by easiness to parallelise loops. Loops represent an important 
source of parallelism occurring in at most all the scientific applications. Many 
algorithms dealing to the scheduling of loop iterations to processors have been 


proposed so far. 


1.Introduction 
Consider that there are p processors denoted in the following by Pi, P2, .... Pp and a 


single parallel loop (see Figure 1.). 


DO PARALLEL I=1,N 
CALL LOOP_BODY(D) 
END DO 
Figure 1. Single Parallel Loop 
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We also assume that the work of the routine loop_body(i) can be evaluated and is 


given by the function w: N—>R, where w(i)=w, represents either the number of 


routine’s operations or its running time (presume that w(0)=0). The total amount of 


N 

work for the parallel loop is ¥ w(i). The efficient loop-scheduling algorithm 
i=l 

distributes equally this total amount of work on processors such that a processor 


N 
receives a quantity of work equal to 2 »y w(i). 
P i=l 


Let 1 jand h j be the lower and upper loop iteration bounds, j =1,2,...,p, such that 
processor j executes all the iteration between J,and h,. These bounds are found 


distributing equally the work on processors by using 


h; 


¥ wl) =—-¥ wi) (Wj =1,2,-..p). (1) 


tol; i=1 


Moreover, they satisfy the following conditions 


1, =1. (2.a) 
A 1 a 
if we know I ,, then h, is given by ¥ wii) =—-S wi)=W. (2.b) 
isl; P iz 
Li =A, +1. (2.c) 


Suppose that Equation (2.b) is computed by a less approximation. This means that if 
we have the value /,, then we find h, as follows: 
Atl 


h,=h © y wi) <W < ¥w(i) (3) 


i=l, i=l; 


In the following, we present an optimal parallel solution for the product between an 
upper diagonal matrix and a vector. This is an important problem that occurs in many 
algorithms for solving linear systems. The Smarandache inferior part function is used 


to distribute equally the work on processors. 
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2. The Smarandache Inferior Part Function 


The inferior part function (sometime is named the floor function) [,]: R > Z, defined 
by [x]=kekSx<k+1, is one of the most used elementary functions. The 


Smarandache inferior part function represents a natural generalisation of the floor 
function [Smaral]. Smarandache proposed and studied this generalisation especially 
in connection to Number Theory functions [Smaral, Smara2). In the following, we 


present equation for some Smarandache inferior part functions. 


Consider f:Z—R a function that is strict increasing and satisfies lim f(@)=-< 
and lim f(n)=00. The Smarandache f-inferior part function denoted by f,: RZ 
is defined by 

fy(sake fk) sx<f(kt+)). (4) 
The function f, is well defined because of the good properties of f When f(k)=k 


the floor function [x] is obtained. In the following we study the Smarandache f- 
k 

inferior part function when f(k) = Si eae 
i=l 

Remark. Sometime, we will study only the positive inferior part by considering 


function f :N —> R, f(0)=0. In this case, we only consider f, :[0,-¢) > Z. 


k 
Theorem 1. If f(k) = )\i, then the Smarandache f-inferior part is given by 
i=] 


fy = Vx>0. (5) 


~1+V1+8-x 
2 
k+l 


Proof The proof is obtained by starting from the double inequality yi: Sx< di. 


=1 | 


Observe that the equation ATO = 2>0 has only one positive root given by 
k= ae >0 . The following equivalences prove Theorem | 


eR og & AE ee KAD R42), 


ere HEE] 
“1 iFEF] 


=> tS Gn K'= 
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Thus, the equation for the Smarandache f-inferior part is fx) = - 1+ a +8: | 
¢ 


k 
Theorem 2. If f(k)= Mi ? | then the Smarandache f-inferior part is given by 


f=l 


| 2 
TC eee ese 2 are er Lc coe) ee Os ler (6) 
2 ¥ 2 Yo } 18 V2 2 } 1728 


Proof We use the Cardano equation for solving x’ + px +q=0. A real root of this 


equation is given by 


k-(K+1)-(2-k +1) 


The equation r = x >0 is transformed as follows: 
K-G4))-Q-E+]) _ So pe se a +k-6x=0 0 


6 
1 3 1 
ep Mie alslounetonk Se) ey es 


Applying Equation (7), we find that 


—— +3/—— +] — | +—_— 
1728 Z 2 1728 


1 |3-x 3x 1 3-x 3.x] 
k=-—+3/——, | — | +—— +3/— +, ——] +—_. 
2 2 y) 1728 2 2 1728 


The Smarandache f-inferior part is given by: 
a ae k+). e (k+))- oa (2- Ue) ee 


ye PS ea 


c (2) +4 {37% (2) 4g *) 
— (22) dy <te10 
(22) hee 2 1728 


Ht 3 82s, (22) ce 
ae (G2) ae 2 1728 ‘oe 
J 


3. An Efficient Algorithm for the Upper Diagonal Matrix-Vector Product 


In this section, we present an efficient algorithm for the product y=a-x between an 


upper diagonal matrix a=(a,,;)._—@M,(R) anda vector xe R". This problem is 


ij=la 
quite important occurring in several other important problems such us solving linear 


systems or LUP matrix decomposition. 


Because a is an upper diagonal matrix, the product y=a- x is given by 
y; =)ia,, xX; Via 12... (8) 
jal 


The product can be computed in parallel by using a simple computation shown below. 


DO PARALLEL i=1,n 
y, =0 
DO j=1i. 
Yi FY, +3 ° X; 
END DO 


END DO 
Figure 2. Parallel Computation for the Upper Matrix — Vector Product. 


For this parallel loop we have the following elements: 
e The work of iteration i is w(i)=i,i=1,2,...,n; the total work is 
si _n (n +1) 
fal 2 
e The quantity of work received by a processor should be approximately equal 


to weit) 
2:-p 


The difficult problem for the efficient loop scheduling algorithm is how Equation (1) 


is implemented. To find the upper bounds from this is quite expensive and can be 


done in O(logn + 2) [Jaja]. But, we want to find the upper bounds in at most O(p) 
P 


complexity and we show that this is possible for our problem. For that we use the 


following theorem 
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Theorem 3. The parallel computation for the upper matrix-vector product can 


efficiently be scheduled on processors (with respect of Equation 1) by using the 


following upper bounds: 
—1l+ 1+4- ] : n-(n+)) 
P g 
h, =|] —+—__+_ L_ (9) 


por 


Proof The Smarandache f-inferior part function presented in Theorem 1 is used to 


ee 
2 


k 
obtain the proof. We found that if f(k)= Si then f, (x)= Vx20. 


i=] 


n-(n+1) 


Since each processor receives a quantity equal to W = , we find that the 


first j-1 processors have received approximately (j —1)-W. Thus, the upper bound of 
processor j is the biggest number k such that all the previous work done by processors 
1,2,... should be approximately equal to pW. Mathematically, this can be written 
as follows 

1+2+..4h, Sj-W<lt2+..th +h, t+) @ 


sale LY) = 


Z 


ae ag. 7 RGD 
P 


hiS|—— | of 
} 5 j p 


= h, =15-W-} 


A more rigorous and technical explanation can be found in [Tabi]. ¢ 


According to this theorem, the efficient scheduling is obtained using the upper bound 
from Equation (9). These bounds certainly give the better approximation of Equation 
1. Thus, the part of parallel loop scheduled on processor j is presented in Figure 3. 


This processor computes all the sums of Equation (8) between h ji tiand h,. 
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aie heaej—y ae ne +a. j 2 @eD 
DO i= P P 


#4, 
2 Z 
y, =0 
DO j=1,i 
¥,=¥, + 4;,5°%; 
END DO 
END DO 


Figure 3. Computation of Processor j. 


4, Computational Results and Final Conclusions 

This section presents some computational results of scheduling the parallel loop from 
Figure 3. In order to find that the proposed method is efficient from the practical point 
of view, two other scheduling algorithms are used. The first scheduling algorithm 


named uniform scheduling, divides the parallel loop into p chunks with the same size 


=]. Obviously, this represents the simplest scheduling strategy but is inefficient 


P 
because all the big sums are computed on processor p. The second scheduling 
algorithm named interleaving, distributes the work on processors from p to p, such 
that a processor does not compute two consecutive works. This scheduling distributes 
the large work equally on processors. All the algorithms have been executed on SGI 
Power Challenge 2000 parallel machine with 16 processors for a upper diagonal 


matrix of dimension 300. The running time are presented in Table 1. 


Sn CL 


Vg P 


Table 1. Computational Times for three Scheduling Algorithms. 


The first important remark that can be outlined is that there is no way to develop 
efficient methods in Computer Science without Mathematics and this article is a 


prove for that. Using a special function named the Smarandache inferior part, it has 
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been possible to find an efficient scheduling algorithm for the upper diagona! matrix- 


vector product. 


The second important remark is that the scheduling proposed in this article is efficient 
in practice as well. Table 1 shows that the times for the line balanced are smallest. It 
can be seen that the interleaving strategy also offers good times. Table 1 also shows 


that the uniform strategy gives the largest times. 
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